Skip to content

Conversation

@eerose2603
Copy link

@eerose2603 eerose2603 commented Oct 18, 2025

Summary

Honor the XObject /ColorSpace hint when deriving Pillow image modes in _get_mode_and_invert_color, especially when nested/indirect color spaces collapse or are ambiguous. This prevents mode mis-detection and preserves CMYK inversion semantics.

Motivation / Problem

In real-world PDFs the image XObject often carries a correct /ColorSpace, while the resolved color space provided to _get_imagemode(...) may be:

NullObject (e.g., after following references),

ICCBased without a usable /Alternate,

Indexed / Separation / DeviceN chains that don’t yield a direct Device* at the point of resolution.

Previously we:

Special-cased only /DeviceRGB, ignoring /DeviceGray and /DeviceCMYK.

Didn’t pass any prior/hinted mode into _get_imagemode(...).

Had no fallback if _get_imagemode(...) returned "".

This could yield the wrong final mode and miss CMYK inversion.

Approach

Compute preferred_mode from IA.COLOR_SPACE:

/DeviceRGB → "RGB", /DeviceGray → "L", /DeviceCMYK → "CMYK".

Pass preferred_mode to _get_imagemode(...) as prev_mode for both:

the <8 bpc branch and

the general branch (including the “2 components if not explicitly gray” rule).

If the derived mode is still "", fall back to preferred_mode.

If the fallback is CMYK, set invert_color=True (consistent with current CMYK handling elsewhere).

Effects / Compatibility

No change when _get_imagemode(...) already returns a concrete mode.

Only ambiguous/degenerate cases change—and they now align with the XObject’s declared /ColorSpace.

Keeps existing decode and flate/JPX handling untouched.

Tests

Added tests asserting that, when color_space collapses (e.g., NullObject), we still:

choose "RGB" for a /DeviceRGB hint (no invert),

choose "L" for /DeviceGray (no invert),

choose "CMYK" for /DeviceCMYK and set invert_color=True.

Preserve hinted color spaces when deriving XObject image modes
@stefan6419846
Copy link
Collaborator

Thanks for the PR. Could you please elaborate on the reasons for this change? Additionally, please have a look at the CI failures for us to continue.

@stefan6419846 stefan6419846 added needs-discussion The PR/issue needs more discussion before we can continue needs-change The PR/issue cannot be handled as issue and needs to be improved labels Oct 20, 2025
@eerose2603 eerose2603 changed the title Preserve hinted color spaces when deriving XObject image modes fix(images): honor /ColorSpace hint when deriving XObject image modes Oct 26, 2025
@eerose2603 eerose2603 changed the title fix(images): honor /ColorSpace hint when deriving XObject image modes BUG: honor XObject /ColorSpace hint when deriving image mode Oct 26, 2025
@eerose2603
Copy link
Author

What changed & why

_get_mode_and_invert_color now treats the XObject’s /ColorSpace entry as an authoritative hint for the final Pillow mode and propagates that hint through the existing _get_imagemode(...) logic:

We derive a preferred_mode from IA.COLOR_SPACE:

/DeviceRGB → "RGB", /DeviceGray → "L", /DeviceCMYK → "CMYK".

We pass preferred_mode as prev_mode to _get_imagemode(...) for both the <8 bpc path and the general path.
This ensures 1/2/4-bit and nested/indirect color spaces respect the hint.

If _get_imagemode(...) collapses to "" (e.g., NullObject, ambiguous Indexed/Separation/DeviceN chains), we fall back to the hint and set invert_color=True when the hint is CMYK—matching existing behavior where CMYK implies inversion.

Problem this solves

Some PDFs declare a reliable /ColorSpace on the image XObject, but the resolved color space passed to _get_imagemode(...) can be ambiguous or collapse (e.g., NullObject, ICCBased without usable Alternate, certain DeviceN/Indexed cases). Previously we only special-cased /DeviceRGB, so the function could return "" (unknown) and lose the author intent—leading to wrong modes (e.g., grayscale treated as RGB) or missed CMYK inversion.

Compatibility

If _get_imagemode(...) returns a concrete mode, nothing changes.

The hint is only used as a tie-breaker / fallback and to seed prev_mode, so it doesn’t override explicit resolvable color spaces.

@stefan6419846
Copy link
Collaborator

My above requests still hold through. Where do we have such an example? Could we please have a proper test for this? (I do not think that the possibly AI-generated initial and previous comment explain this issue in a suitable way to make it clear to me.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-change The PR/issue cannot be handled as issue and needs to be improved needs-discussion The PR/issue needs more discussion before we can continue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants